The Policy Improvement Algorithm : General Theory
نویسنده
چکیده
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy improvement algorithm generates a sequence of policies which are c-regular (a strong stability condition), where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy, and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm are bounded from below, and \nearly" decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations, and hence an optimal average cost policy. These results shed new light on the optimal scheduling problem for multiclass queueing networks. Surprisingly, it is found that the formulation of optimal policies for a network is closely linked to the optimal control of its associated uid model. Moreover, the relative value function for the network control problem is closely related to the value function for the uid network. These results are surprising since randomness plays such an important role in network performance.
منابع مشابه
Monetary and Fiscal Policy Interaction in Iran: A Dynamic Stochastic General Equilibrium Approach
Achieving the goals of price stability, sustainable economic growth, and the improvement of many economic variables require coordination between the monetary and financial authorities. In this study, a new modified Keynesian stochastic dynamic equilibrium general equilibrium model is introduced for Iran and in the framework of game theory, optimal policy of fiscal and monetary authorities are d...
متن کاملAnalysis of Ahmadinejad Government's Foreign Policy According to the Critical Theory of International Relations
The goal of this article is the analysis of Ahmadinejad government's foreign policy according to the critical theory of international relations. Whether it seems foreign policy of Mahmood Ahmadinejad, government is different from his before and after governments. Perhaps it could be provided more detailed and objective analysis from foreign policy of Islamic Republic of Iran during Ahmadinejad ...
متن کاملSafe Policy Iteration
CONTRIBUTIONS 1. Theoretical contribution. We introduce a new, more general lower bound to the policy improvement of an arbitrary policy compared to another policy based on the ability to bound the distance between the future state distributions. 2. Algorithmic contribution. We define two approximate policy–iteration algorithms whose policy improvement moves toward the estimated greedy policy b...
متن کاملThe Conceptual Model of the principals Competency Development in Secondary School, grounded Theory
The purpose of this study was to develop a conceptual model for the competence of high school principals in Tehran province. This qualitative research was carried out using a strategy based on the grounded theory. In this regard, using a targeted approach and theoretical saturation criterion, semi-structured interviews with 17 people (7 faculty members specialized in the field of educational ma...
متن کاملIntegral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space
Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...
متن کاملResource Based View: A Promising New Theory for Healthcare Organizations; Comment on “Resource Based View of the Firm as a Theoretical Lens on the Organisational Consequences of Quality Improvement”
This commentary reviews a recent piece by Burton and Rycroft-Malone on the use of Resource Based View (RBV) in healthcare organizations. It first outlines the core content of their piece. It then discusses their attempts to extend RBV to the analysis of large scale quality improvement efforts in healthcare. Some critique is elaborated. The broader question of why RBV seems to be migrating into ...
متن کامل